Introduction to Time Series Analysis - 04
This note is for course MATH 545 at McGill University.
Lecture 10 - Lecture 12
Estimation of mean μ \mu μ and auto-covariance function ρ ( h ) \rho(h) ρ ( h ) for stationary { X t } \{X_t\} { X t } when E [ X t ] = μ E[X_t]=\mu E [ X t ] = μ
μ ^ = X ˉ n = X 1 + ⋯ + X n n \hat{\mu}=\bar{X}_{n}=\frac{X_{1}+\cdots+X_{n}}{n} μ ^ = X ˉ n = n X 1 + ⋯ + X n
E [ X ˉ n ] = E [ X 1 ] + ⋯ + E [ X n ] n = μ E\left[\bar{X}_{n}\right]=\frac{E\left[X_{1}\right]+\cdots+E\left[X_{n}\right]}{n}=\mu E [ X ˉ n ] = n E [ X 1 ] + ⋯ + E [ X n ] = μ
E [ ( X ˉ n − μ ) 2 ] = MSE ( X ˉ n ) = Var ( X ˉ n ) E\left[\left(\bar{X}_{n}-\mu\right)^{2}\right]=\operatorname{MSE}\left(\bar{X}_{n}\right)=\operatorname{Var}\left(\bar{X}_{n}\right) E [ ( X ˉ n − μ ) 2 ] = M S E ( X ˉ n ) = V a r ( X ˉ n )
Var ( X ˉ n ) = Var ( 1 n [ X 1 + ⋯ + X n ] ) = 1 n 2 Var ( X 1 + ⋯ + X n ) = 1 n 2 ∑ i = 1 n ∑ j = 1 n Cov ( X i , X j ) = 1 h 2 ∑ i = 1 n ∑ j = 1 n γ ( i − j ) = 1 n 2 ∑ h = − n n ( n − ∣ h ∣ ) γ ( h ) = 1 n ∑ h = − n n ( 1 − ∣ h ∣ n ) γ ( h ) \begin{aligned} \operatorname{Var}\left(\bar{X}_{n}\right) &=\operatorname{Var}\left(\frac{1}{n}\left[X_{1}+\cdots +X_{n}\right]\right) \\ &=\frac{1}{n^{2}} \text{Var} \left(X_{1}+\cdots+X_{n}\right) \\ &=\frac{1}{n^{2}} \sum_{i=1}^{n} \sum_{j=1}^{n}\text{Cov}\left(X_{i}, X_{j}\right) \\ &=\frac{1}{h^{2}} \sum_{i=1}^{n} \sum_{j=1}^{n} \gamma(i-j) \\ &=\frac{1}{n^{2}} \sum_{h=-n}^{n}(n-|h|) \gamma(h) \\ &=\frac{1}{n} \sum_{h=-n}^{n}\left(1-\frac{|h|}{n}\right) \gamma(h) \end{aligned} V a r ( X ˉ n ) = V a r ( n 1 [ X 1 + ⋯ + X n ] ) = n 2 1 Var ( X 1 + ⋯ + X n ) = n 2 1 i = 1 ∑ n j = 1 ∑ n Cov ( X i , X j ) = h 2 1 i = 1 ∑ n j = 1 ∑ n γ ( i − j ) = n 2 1 h = − n ∑ n ( n − ∣ h ∣ ) γ ( h ) = n 1 h = − n ∑ n ( 1 − n ∣ h ∣ ) γ ( h )
(Note that here ( n − ∣ h ∣ ) (n-|h|) ( n − ∣ h ∣ ) is the number of possible observed lags. And we have 0 ≤ ∣ i − j ∣ ≤ n 0\leq |i-j| \leq n 0 ≤ ∣ i − j ∣ ≤ n which means n + 1 n+1 n + 1 values of ∣ i − j ∣ |i-j| ∣ i − j ∣ )
If γ ( h ) → 0 \gamma(h) \rightarrow 0 γ ( h ) → 0 as n → ∞ n \rightarrow \infty n → ∞ , then Var ( X ˉ n ) = MSE ( X ˉ n ) → \operatorname{Var}\left(\bar{X}_{n}\right)=\operatorname{MSE}\left(\bar{X}_{n}\right) \rightarrow V a r ( X ˉ n ) = M S E ( X ˉ n ) → as n → ∞ n \rightarrow \infty n → ∞
If ∑ h = − ∞ ∞ ∣ γ ( h ) ∣ < ∞ \sum_{h=-\infty}^{\infty}|\gamma(h)|<\infty ∑ h = − ∞ ∞ ∣ γ ( h ) ∣ < ∞ , then lim n → ∞ n Var ( X ˉ n ) = ∑ n = − ∞ ∞ ∣ γ ( h ) ∣ \lim _{n \rightarrow \infty} n \operatorname{Var}\left(\bar{X}_{n}\right)=\sum_{n=-\infty}^{\infty}|\gamma(h)| lim n → ∞ n V a r ( X ˉ n ) = ∑ n = − ∞ ∞ ∣ γ ( h ) ∣
If { X t } \{X_t\} { X t } are also Gaussian, then X ˉ n ∼ N ( μ , 1 n ∑ ∣ h ∣ < ∞ ( 1 − ∣ h ∣ n ) γ ( h ) ) \bar{X}_{n} \sim N\left(\mu, \frac{1}{n} \sum_{|h|<\infty}\left(1-\frac{|h|}{n}\right) \gamma(h)\right) X ˉ n ∼ N ( μ , n 1 ∑ ∣ h ∣ < ∞ ( 1 − n ∣ h ∣ ) γ ( h ) )
To do testing and identical estimation X ˉ n ± Z Q / 2 Γ n \bar{X}_{n} \pm Z_{Q / 2} \frac{\sqrt{\Gamma}}{\sqrt{n}} X ˉ n ± Z Q / 2 n Γ , where Γ = ∑ ∣ h ∣ < ∞ γ ( h ) \Gamma = \sum_{|h| < \infty} \gamma(h) Γ = ∑ ∣ h ∣ < ∞ γ ( h )
Plugin Γ ^ = ∑ ∣ h ∣ γ ^ ( h ) \hat{\Gamma}=\sum_{|h|}\hat{\gamma}(h) Γ ^ = ∑ ∣ h ∣ γ ^ ( h )
Example: AR(1) process
Let { X t } \{X_t\} { X t } be defined by X t − μ = ϕ ( X t − 1 − μ ) + Z t X_t - \mu = \phi(X_{t-1}-\mu) + Z_t X t − μ = ϕ ( X t − 1 − μ ) + Z t where ∣ ϕ ∣ < 1 |\phi| < 1 ∣ ϕ ∣ < 1 and { Z t } ∼ W N ( 0 , σ 2 ) \{Z_t\}\sim WN(0, \sigma^2) { Z t } ∼ W N ( 0 , σ 2 )
So, Γ = ∑ ∣ h ∣ < ∞ γ ( h ) = ∑ ∣ h ∣ < ∞ σ 2 ϕ ∣ h ∣ 1 − ϕ 2 = ( 1 + 2 ∑ h = 1 ∞ ϕ ∣ h ∣ ) σ 2 1 − ϕ 2 = ( 1 + 2 ϕ 1 − ϕ ) σ 2 1 − ϕ 2 = 1 − ϕ + 2 ϕ 1 − ϕ σ 2 ( 1 + ϕ ) ( 1 − ϕ ) = σ 2 ( 1 − ϕ ) 2 \begin{aligned} \Gamma &=\sum_{|h| < \infty} \gamma(h) = \sum_{|h| < \infty} \frac{\sigma^2 \phi^{|h|}}{1-\phi^2} \\ &=(1+2\sum^{\infty}_{h=1} \phi^{|h|})\frac{\sigma^2}{1-\phi^2} \\ &=(1+2\frac{\phi}{1-\phi})\frac{\sigma^2}{1-\phi^2} \\ &=\frac{1-\phi+2\phi}{1-\phi}\frac{\sigma^2}{(1+\phi)(1-\phi)} \\ &=\frac{\sigma^2}{(1-\phi)^2} \end{aligned} Γ = ∣ h ∣ < ∞ ∑ γ ( h ) = ∣ h ∣ < ∞ ∑ 1 − ϕ 2 σ 2 ϕ ∣ h ∣ = ( 1 + 2 h = 1 ∑ ∞ ϕ ∣ h ∣ ) 1 − ϕ 2 σ 2 = ( 1 + 2 1 − ϕ ϕ ) 1 − ϕ 2 σ 2 = 1 − ϕ 1 − ϕ + 2 ϕ ( 1 + ϕ ) ( 1 − ϕ ) σ 2 = ( 1 − ϕ ) 2 σ 2
So 95% Confidence Interval for μ \mu μ :
X ˉ n ± 1.96 n σ ( 1 − ϕ ) 2 = X ˉ n ± 1.96 n σ ∣ 1 − ϕ ∣ \bar{X}_n \pm \frac{1.96}{\sqrt{n}}\frac{\sigma}{\sqrt{(1-\phi)^2}} = \bar{X}_n \pm \frac{1.96}{\sqrt{n}}\frac{\sigma}{|1-\phi|} X ˉ n ± n 1 . 9 6 ( 1 − ϕ ) 2 σ = X ˉ n ± n 1 . 9 6 ∣ 1 − ϕ ∣ σ
γ ^ ( h ) = 1 n ∑ t = 1 n − ∣ h ∣ ( X t + ∣ h ∣ − X ˉ n ) ( X t − X ˉ n ) \hat{\gamma}(h)=\frac{1}{n}\sum_{t=1}^{n-|h|}(X_{t+|h|}-\bar{X}_n)(X_t-\bar{X}_n) γ ^ ( h ) = n 1 ∑ t = 1 n − ∣ h ∣ ( X t + ∣ h ∣ − X ˉ n ) ( X t − X ˉ n )
ρ ^ ( h ) = γ ^ ( h ) γ ^ ( 0 ) \hat{\rho}(h)=\frac{\hat{\gamma}(h)}{\hat{\gamma}(0)} ρ ^ ( h ) = γ ^ ( 0 ) γ ^ ( h )
Both biased in finite samples, but consistent
Γ ^ k = [ γ ^ ( 0 ) γ ^ ( 1 ) ⋯ γ ^ ( k − 1 ) γ ^ ( k − 2 ) ⋮ ⋱ ⋮ γ ^ ( k − 1 ) ⋯ γ ^ ( 0 ) ] \hat{\Gamma}_k = \left [
\begin{array}{cccc}
\begin{array}{l}
\hat{\gamma}(0)\\
\hat{\gamma}(1)
\end{array}
& \cdots &
\begin{array}{l}\hat{\gamma}(k-1) \\ \hat{\gamma}(k-2)
\end{array} \\
\vdots & \ddots & \vdots\\
\begin{array}{l}
\hat{\gamma}(k-1)
\end{array} &
\cdots &
\begin{array}{l}
\hat{\gamma}(0) \\
\end{array}
\end{array}
\right ]
Γ ^ k = ⎣ ⎢ ⎢ ⎢ ⎡ γ ^ ( 0 ) γ ^ ( 1 ) ⋮ γ ^ ( k − 1 ) ⋯ ⋱ ⋯ γ ^ ( k − 1 ) γ ^ ( k − 2 ) ⋮ γ ^ ( 0 ) ⎦ ⎥ ⎥ ⎥ ⎤
For all k < n k<n k < n , Γ ^ k \hat{\Gamma}_k Γ ^ k will be non-negative definite. Not obvious how to estimate for k ≥ n k\geq n k ≥ n , even for k k k close to n n n , Γ ^ \hat{\Gamma} Γ ^ will be unstable.
(Jenkin’s rule & theorems: need n = 50 n=50 n = 5 0 and h ≤ n 4 h\leq \frac{n}{4} h ≤ 4 n )
In large samples without large lags, can approximate the distribution of ( ρ ^ ( 1 ) , ⋯ , ρ ^ ( k ) ) (\hat{\rho}(1), \cdots, \hat{\rho}(k)) ( ρ ^ ( 1 ) , ⋯ , ρ ^ ( k ) ) by:
ρ ^ ∼ M V N ( ρ , 1 n W ) \hat{\rho} \sim MVN(\rho, \frac{1}{n}W)
ρ ^ ∼ M V N ( ρ , n 1 W )
where W W W is a k × k k\times k k × k covariance matrix with elements computed by a simplification of butlet’s formula:
w i j = ∑ k = 1 ∞ { ρ ( k + i ) + ρ ( k − i ) − 2 ρ ( k ) ρ ( i ) } × { ρ ( k + j ) + ρ ( k − j ) − 2 ρ ( k ) ρ ( j ) } w_{ij}=\sum^\infty_{k=1}\{\rho(k+i)+\rho(k-i)-2\rho(k)\rho(i)\}\times\{\rho(k+j)+\rho(k-j)-2\rho(k)\rho(j)\} w i j = ∑ k = 1 ∞ { ρ ( k + i ) + ρ ( k − i ) − 2 ρ ( k ) ρ ( i ) } × { ρ ( k + j ) + ρ ( k − j ) − 2 ρ ( k ) ρ ( j ) }
Example
{ X t } \{X_t\} { X t } iid ⇒ \Rightarrow ⇒ ρ ( 0 ) = 1 \rho(0)=1 ρ ( 0 ) = 1 and ρ ( h ) = 0 ∣ h ∣ > 1 \rho(h)=0 \quad |h|>1 ρ ( h ) = 0 ∣ h ∣ > 1
So w i j = ∑ k = 1 ∞ ρ ( k − i ) ρ ( k − j ) w_{ij}=\sum^\infty_{k=1}\rho(k-i)\rho(k-j) w i j = ∑ k = 1 ∞ ρ ( k − i ) ρ ( k − j )
We have w i i = 1 w_{ii}=1 w i i = 1 and w i j = 0 w_{ij}=0 w i j = 0 , so ρ ^ ( h ) ∼ N ( 0 , 1 n ) \hat{\rho}(h) \sim N(0, \frac{1}{n}) ρ ^ ( h ) ∼ N ( 0 , n 1 ) as W = I W=I W = I
We have an MA(1) process with X t = Z t + θ Z t − 1 X_t=Z_t+\theta Z_{t-1} X t = Z t + θ Z t − 1 where Z t ∼ W N ( 0 , σ 2 ) Z_t \sim WN(0, \sigma^2) Z t ∼ W N ( 0 , σ 2 )
Then X t = ∑ k = 0 ∞ ψ k Z t − k X_t=\sum^\infty_{k=0}\psi_k Z_{t-k} X t = ∑ k = 0 ∞ ψ k Z t − k and
⇒ ψ k = { 1 , for k = 0 θ , for k = 1 0 , for k ≠ { 0 , 1 } \Rightarrow \psi_k = \begin{cases} 1, \text{for } k=0 \\ \theta, \text{for } k=1 \\ 0, \text{for } k\neq\{0, 1\} \end{cases} ⇒ ψ k = ⎩ ⎪ ⎨ ⎪ ⎧ 1 , for k = 0 θ , for k = 1 0 , for k = { 0 , 1 }
So γ X ( h ) = ∑ j = − ∞ ∞ ψ j ψ j − h σ 2 = { ( 1 + θ 2 ) σ 2 , for h = 0 θ σ 2 , for h = 1 0 , for h ≥ 2 \gamma_X(h)=\sum^\infty_{j=-\infty} \psi_j \psi_{j-h}\sigma^2 = \begin{cases}(1+\theta^2)\sigma^2, \text{for } h=0 \\ \theta\sigma^2, \text{for } h=1 \\ 0, \text{for } h\geq 2 \end{cases} γ X ( h ) = ∑ j = − ∞ ∞ ψ j ψ j − h σ 2 = ⎩ ⎪ ⎨ ⎪ ⎧ ( 1 + θ 2 ) σ 2 , for h = 0 θ σ 2 , for h = 1 0 , for h ≥ 2
So ρ ( 0 ) = 1 \rho(0) = 1 ρ ( 0 ) = 1 and ρ ( ± 1 ) = γ ( 1 ) γ ( 0 ) = θ 1 + θ 2 \rho(\pm 1) = \frac{\gamma(1)}{\gamma(0)} = \frac{\theta}{1+\theta^2} ρ ( ± 1 ) = γ ( 0 ) γ ( 1 ) = 1 + θ 2 θ
From the formula w i j = ∑ k = 1 ∞ { ρ ( k + i ) + ρ ( k − i ) − 2 ρ ( k ) ρ ( i ) } × { ρ ( k + j ) + ρ ( k − j ) − 2 ρ ( k ) ρ ( j ) } w_{ij}=\sum^\infty_{k=1}\{\rho(k+i)+\rho(k-i)-2\rho(k)\rho(i)\}\times\{\rho(k+j)+\rho(k-j)-2\rho(k)\rho(j)\} w i j = ∑ k = 1 ∞ { ρ ( k + i ) + ρ ( k − i ) − 2 ρ ( k ) ρ ( i ) } × { ρ ( k + j ) + ρ ( k − j ) − 2 ρ ( k ) ρ ( j ) } , we have for i = j i=j i = j
w i i = ∑ k = 1 ∞ { ρ ( k + i ) + ρ ( k − i ) − 2 ρ ( k ) ρ ( i ) } 2 w_{ii}=\sum^\infty_{k=1}\{\rho(k+i)+\rho(k-i)-2\rho(k)\rho(i)\}^2 w i i = ∑ k = 1 ∞ { ρ ( k + i ) + ρ ( k − i ) − 2 ρ ( k ) ρ ( i ) } 2
w 11 = ( ρ ( o ) − 2 ρ ( 1 ) ) 2 + ρ ( 1 ) 2 = 1 − 3 ρ ( 1 ) 2 + 4 ρ ( 1 ) 4 w_{11}=(\rho(o)-2\rho(1))^2 + \rho(1)^2 = 1-3\rho(1)^2+4\rho(1)^4 w 1 1 = ( ρ ( o ) − 2 ρ ( 1 ) ) 2 + ρ ( 1 ) 2 = 1 − 3 ρ ( 1 ) 2 + 4 ρ ( 1 ) 4
If it is the case of MA(1), ρ ^ ( 1 ) ∼ N ( θ 1 + θ 2 , 1 n ( 1 − 3 ρ ( 1 ) 2 + 4 ρ ( 1 ) 4 ) ) \hat{\rho}(1) \sim N(\frac{\theta}{1+\theta^2}, \frac{1}{n}(1-3\rho(1)^2+4\rho(1)^4)) ρ ^ ( 1 ) ∼ N ( 1 + θ 2 θ , n 1 ( 1 − 3 ρ ( 1 ) 2 + 4 ρ ( 1 ) 4 ) )
For i > 1 i>1 i > 1 , w i i = ∑ k = 1 ∞ { ρ ( k + i ) + ρ ( k − i ) − 2 ρ ( k ) ρ ( i ) } 2 = ρ ( 0 ) 2 + ρ ( 1 ) 2 + ρ ( − 1 ) 2 = 1 + 2 ρ ( 1 ) 2 w_{ii}=\sum^\infty_{k=1}\{\rho(k+i)+\rho(k-i)-2\rho(k)\rho(i)\}^2 \\=\rho(0)^2 + \rho(1)^2 + \rho(-1)^2 = 1+2\rho(1)^2 w i i = ∑ k = 1 ∞ { ρ ( k + i ) + ρ ( k − i ) − 2 ρ ( k ) ρ ( i ) } 2 = ρ ( 0 ) 2 + ρ ( 1 ) 2 + ρ ( − 1 ) 2 = 1 + 2 ρ ( 1 ) 2
(Note: because in this case, ρ ( k + i ) \rho(k+i) ρ ( k + i ) is always 0, ρ ( k ) ρ ( i ) \rho(k)\rho(i) ρ ( k ) ρ ( i ) is always 0, ρ ( k − i ) \rho(k-i) ρ ( k − i ) is nonzero only when k = i , i + 1 , i − 1 k = i, i+1, i-1 k = i , i + 1 , i − 1 )
Prediction (Forecasting)
Goal: Find linear construction of X 1 , ⋯ , X n X_1, \cdots, X_n X 1 , ⋯ , X n that forecasts X n + h X_{n+h} X n + h with minimum MSE
Assume best linear predictor of X n + h X_{n+h} X n + h is:
P n X n + h = a 0 + a 1 X n + a 2 X n − 1 + ⋯ + a n X 1 P_nX_{n+h} = a_0+a_1X_n+a_2X_{n-1}+\cdots+a_nX_1 P n X n + h = a 0 + a 1 X n + a 2 X n − 1 + ⋯ + a n X 1
We want to find a 0 , a 1 , ⋯ , a n a_0, a_1, \cdots, a_n a 0 , a 1 , ⋯ , a n to minimize:
S ( a 0 , a 1 , ⋯ , a n ) = E [ ( X n + h − ( a 0 + a 1 X n + a 2 X n − 1 + ⋯ + a n X 1 ) ) 2 ] S(a_0, a_1, \cdots, a_n) = E[(X_{n+h} - (a_0+a_1X_n+a_2X_{n-1}+\cdots+a_nX_1))^2] S ( a 0 , a 1 , ⋯ , a n ) = E [ ( X n + h − ( a 0 + a 1 X n + a 2 X n − 1 + ⋯ + a n X 1 ) ) 2 ]
S S S is quadratic, bounded below by zero, and there exist at least one solution to ∂ ∂ a j S ( a 0 , ⋯ , a n ) = 0 \frac{\partial}{\partial a_j}S(a_0, \cdots, a_n)=0 ∂ a j ∂ S ( a 0 , ⋯ , a n ) = 0 for j = 0 , 1 , ⋯ , n j = 0, 1, \cdots, n j = 0 , 1 , ⋯ , n
By taking derivatives, it gives (assuming interchange ∂ ∂ a j \frac{\partial}{\partial a_j} ∂ a j ∂ and E [ ⋅ ] E[\cdot] E [ ⋅ ] safely)
[1]\begin{equation}\frac{\partial}{\partial a_0}S(a_0, \cdots, a_n)=0 \Rightarrow E[X_{n+h} - a_0 - \sum^n_{i=1}a_iX_{n+1-i}]=0 \end{equation}
[2]\begin{equation}\frac{\partial}{\partial a_j}S(a_0, \cdots, a_n)=0 \Rightarrow E[(X_{n+h} - a_0 - \sum^n_{i=1}a_iX_{n+1-i})X_{n+1-j}]=0 \end{equation}
From equation 1 we have:
a 0 = E ( X n + h ) − ∑ i = 1 n a i E ( X n + 1 − i ) = μ − μ ∑ i = 1 n a i a_0 = E(X_{n+h}) - \sum_{i=1}^n a_i E(X_{n+1-i}) \\=\mu-\mu \sum^n_{i=1} a_i a 0 = E ( X n + h ) − ∑ i = 1 n a i E ( X n + 1 − i ) = μ − μ ∑ i = 1 n a i if { X t } \{X_t\} { X t } is stationary
Plug it into equation 2, we have:
E [ ( X n + h − μ ( 1 − ∑ i = 1 n a i ) − ∑ i = 1 n a i X n + 1 − i ) X n + 1 − j ] = 0 E[(X_{n+h} - \mu(1-\sum^n_{i=1} a_i) - \sum^n_{i=1}a_iX_{n+1-i})X_{n+1-j}] = 0 E [ ( X n + h − μ ( 1 − ∑ i = 1 n a i ) − ∑ i = 1 n a i X n + 1 − i ) X n + 1 − j ] = 0
E [ ( X n + h − μ ) X n + 1 − j ] = E [ ∑ i = 1 n a i ( X n + 1 − i − μ ) X n + 1 − j ] E[(X_{n+h}-\mu)X_{n+1-j}]=E[\sum^n_{i=1}a_i(X_{n+1-i}-\mu)X_{n+1-j}] E [ ( X n + h − μ ) X n + 1 − j ] = E [ ∑ i = 1 n a i ( X n + 1 − i − μ ) X n + 1 − j ]
γ ( h + j − 1 ) = ∑ i = 1 n a i E [ ( X n + h − μ ) X n + 1 − j ] = ∑ i = 1 n a i γ ( i − j ) \gamma(h+j-1) = \sum^n_{i=1}a_i E[(X_{n+h}-\mu)X_{n+1-j}] = \sum^n_{i=1}a_i \gamma(i-j) γ ( h + j − 1 ) = ∑ i = 1 n a i E [ ( X n + h − μ ) X n + 1 − j ] = ∑ i = 1 n a i γ ( i − j )
We have a matrix form of this formula Γ n a n ∼ = γ n ∼ ( h ) \Gamma_n\underset{\sim}{a_n} = \underset{\sim}{\gamma_n}(h) Γ n ∼ a n = ∼ γ n ( h ) where ( Γ n ) i j = γ ( i − j ) (\Gamma_n)_{ij} = \gamma(i-j) ( Γ n ) i j = γ ( i − j )
Therefore P n X n + h = μ + ∑ i = 1 n a i ( X n + 1 − i − μ ) P_nX_{n+h} = \mu + \sum^n_{i=1}a_i(X_{n+1-i}-\mu) P n X n + h = μ + ∑ i = 1 n a i ( X n + 1 − i − μ ) where a n ∼ \underset{\sim}{a_n} ∼ a n satisfies Γ n a n ∼ = γ n ∼ ( h ) \Gamma_n\underset{\sim}{a_n} = \underset{\sim}{\gamma_n}(h) Γ n ∼ a n = ∼ γ n ( h ) .
Obviously, E [ X n + h − ( μ + ∑ i = 1 n a i ( X n + 1 − i − μ ) ) ] = 0 E[X_{n+h} - (\mu + \sum^n_{i=1}a_i(X_{n+1-i}-\mu))] = 0 E [ X n + h − ( μ + ∑ i = 1 n a i ( X n + 1 − i − μ ) ) ] = 0
\begin{align}MSE &=E((X_{n+h} - P_nX_{n+h})^2) \\&=E(X_{n+h} - (\mu + \sum^n_{i=1}a_i(X_{n+1-i}-\mu))^2) \\&=E[((X_{n+h} - \mu) -( \sum^n_{i=1}a_i(X_{n+1-i}-\mu)))^2] \\&=E((X_{n+h} - \mu)^2) -2E((X_{n+h} - \mu)(\sum^n_{i=1}a_i(X_{n+1-i}-\mu))) + E((\sum^n_{i=1}a_i(X_{n+1-i}-\mu))^2) \\&=\gamma(0) - 2\sum^n_{i=1} E((X_{n+h}-\mu)(X_{n+1-i}-\mu))+\sum^n_{i=1}\sum^n_{j=1}a_iE((X_{n+1-i}-\mu)(X_{n+1-j}-\mu))a_j \\&=\gamma(0) - w\sum^n_{i=1}a_i\gamma(h+i-1) + \sum^n_{i=1}\sum^n_{j=1}a_i\gamma(i-j)a_j \\&=\gamma(0) - w\sum^n_{i=1}a_i\gamma(h+i-1) + \sum^n_{i=1}a_i(\sum^n_{j=1}\gamma(i-j)a_j) \\&=\gamma(0) - 2\underset{\sim}{a_n}^T\underset{\sim}{\gamma_n}(h) + \underset{\sim}{a_n}^T\Gamma_n\underset{\sim}{a_n} \\&=\gamma(0) - \underset{\sim}{a_n}^T\underset{\sim}{\gamma_n}(h) \end{align}
(Note: because a n ∼ \underset{\sim}{a_n} ∼ a n satisfies Γ n a n ∼ = γ n ∼ ( h ) \Gamma_n\underset{\sim}{a_n} = \underset{\sim}{\gamma_n}(h) Γ n ∼ a n = ∼ γ n ( h ) , so a n ∼ T γ n ∼ ( h ) = a n ∼ T Γ n a n ∼ \underset{\sim}{a_n}^T\underset{\sim}{\gamma_n}(h) = \underset{\sim}{a_n}^T\Gamma_n\underset{\sim}{a_n} ∼ a n T ∼ γ n ( h ) = ∼ a n T Γ n ∼ a n )
Example AR(1)
X t = ϕ X t − 1 + Z t X_t = \phi X_{t-1} + Z_t X t = ϕ X t − 1 + Z t , where ∣ ϕ ∣ < 1 |\phi|<1 ∣ ϕ ∣ < 1 and { Z t } ∼ W N ( 0 , σ 2 ) \{Z_t\} \sim WN(0, \sigma^2) { Z t } ∼ W N ( 0 , σ 2 )
We try a 1 = ϕ , a k = 0 a_1=\phi, a_k = 0 a 1 = ϕ , a k = 0 for k = 2 , ⋯ , n k = 2, \cdots, n k = 2 , ⋯ , n
A solution that works is a n ∼ = ( ϕ , 0 , ⋯ , 0 ) \underset{\sim}{a_n} = (\phi, 0, \cdots, 0) ∼ a n = ( ϕ , 0 , ⋯ , 0 ) and P n X n + 1 = a n ∼ T X n ∼ = ϕ X n P_nX_{n+1} = \underset{\sim}{a_n}^T\underset{\sim}{X_n} = \phi X_n P n X n + 1 = ∼ a n T ∼ X n = ϕ X n
E ( ( X n + 1 − P n X n + 1 ) 2 ) = γ ( 0 ) − a n ∼ T γ ∼ ( 1 ) = σ 2 1 − ϕ 2 − ϕ γ ( 1 ) = σ 2 1 − ϕ 2 − ϕ σ 2 ϕ 1 − ϕ 2 = σ 2 E((X_{n+1} - P_nX_{n+1})^2) = \gamma(0) - \underset{\sim}{a_n}^T \underset{\sim}{\gamma}(1) \\=\frac{\sigma^2}{1-\phi^2} - \phi\gamma(1) =\frac{\sigma^2}{1-\phi^2} - \phi\frac{\sigma^2\phi}{1-\phi^2} \\= \sigma^2 E ( ( X n + 1 − P n X n + 1 ) 2 ) = γ ( 0 ) − ∼ a n T ∼ γ ( 1 ) = 1 − ϕ 2 σ 2 − ϕ γ ( 1 ) = 1 − ϕ 2 σ 2 − ϕ 1 − ϕ 2 σ 2 ϕ = σ 2
Now let Y Y Y and W 1 , ⋯ , W n W_1, \cdots, W_n W 1 , ⋯ , W n be any random variabe with finite second moments and means μ = E ( Y ) \mu = E(Y) μ = E ( Y ) and μ i = E ( W i ) \mu_i = E(W_i) μ i = E ( W i ) , and covariance C o v ( Y , Y ) , C o v ( Y , W i ) , C o v ( W i , W − j ) Cov(Y, Y), Cov(Y, W_i), Cov(W_i, W-j) C o v ( Y , Y ) , C o v ( Y , W i ) , C o v ( W i , W − j )
Let W ∼ = ( W n , ⋯ , W 1 ) \underset{\sim}{W} = (W_n, \cdots, W_1) ∼ W = ( W n , ⋯ , W 1 ) , μ ∼ = ( μ n , ⋯ , μ 1 ) \underset{\sim}{\mu} = (\mu_n, \cdots, \mu_1) ∼ μ = ( μ n , ⋯ , μ 1 ) , and γ ∼ = C o v ( Y , W ∼ ) \underset{\sim}{\gamma} = Cov(Y, \underset{\sim}{W}) ∼ γ = C o v ( Y , ∼ W ) , Γ = C o v ( W ∼ , W ∼ ) \Gamma = Cov(\underset{\sim}{W}, \underset{\sim}{W}) Γ = C o v ( ∼ W , ∼ W ) such that Γ i j = C o v ( W n + 1 − i , W n + 1 − j ) \Gamma_{ij} = Cov(W_{n+1-i}, W_{n+1-j}) Γ i j = C o v ( W n + 1 − i , W n + 1 − j )
By exactly the same methods from before, we show that the best linear predictor of Y Y Y given W W W is:
P ( Y ∣ W ) = μ Y + a ∼ T ( W ∼ − μ W ∼ ) P(Y|W) = \mu_Y + \underset{\sim}{a}^T(\underset{\sim}{W} - \underset{\sim}{\mu_W}) P ( Y ∣ W ) = μ Y + ∼ a T ( ∼ W − ∼ μ W )
where a ∼ = ( a 1 , ⋯ , a n ) \underset{\sim}{a} = (a_1, \cdots, a_n) ∼ a = ( a 1 , ⋯ , a n ) is any solution of Γ a ∼ = γ ∼ \Gamma\underset{\sim}{a} = \underset{\sim}{\gamma} Γ ∼ a = ∼ γ
Return to AR(1), assume that we observe X 1 X_1 X 1 and X 3 X_3 X 3 but not X 2 X_2 X 2
Let Y = X 2 Y=X_2 Y = X 2 and W = ( X 1 , X 3 ) W=(X_1, X_3) W = ( X 1 , X 3 ) , we have
Γ = ( σ 2 1 − ϕ 2 ϕ 2 σ 2 1 − ϕ 2 ϕ 2 σ 2 1 − ϕ 2 σ 2 1 − ϕ 2 ) \Gamma = \begin{pmatrix}
\frac{\sigma^2}{1-\phi^2} & \frac{\phi^2\sigma^2}{1-\phi^2}\\
\frac{\phi^2\sigma^2}{1-\phi^2} & \frac{\sigma^2}{1-\phi^2}
\end{pmatrix} Γ = ( 1 − ϕ 2 σ 2 1 − ϕ 2 ϕ 2 σ 2 1 − ϕ 2 ϕ 2 σ 2 1 − ϕ 2 σ 2 )
γ = ( C o v ( X 2 , X 1 ) C o v ( X 2 , X 3 ) ) = ( σ 2 1 − ϕ 2 ϕ σ 2 1 − ϕ 2 ϕ ) \gamma = \begin{pmatrix}
Cov(X_2, X_1) \\
Cov(X_2, X_3)
\end{pmatrix} = \begin{pmatrix}
\frac{\sigma^2}{1-\phi^2}\phi \\
\frac{\sigma^2}{1-\phi^2}\phi
\end{pmatrix} γ = ( C o v ( X 2 , X 1 ) C o v ( X 2 , X 3 ) ) = ( 1 − ϕ 2 σ 2 ϕ 1 − ϕ 2 σ 2 ϕ )
As Γ a ∼ = γ ∼ ⇒ ( 1 ϕ 2 ϕ 2 1 ) a ∼ = ( ϕ ϕ ) \Gamma\underset{\sim}{a} = \underset{\sim}{\gamma} \Rightarrow \begin{pmatrix}
1 & \phi^2\\
\phi^2 & 1
\end{pmatrix}\underset{\sim}{a} = \begin{pmatrix}
\phi\\
\phi
\end{pmatrix} Γ ∼ a = ∼ γ ⇒ ( 1 ϕ 2 ϕ 2 1 ) ∼ a = ( ϕ ϕ )
So a ∼ = ( ϕ 1 + ϕ 2 ϕ 1 + ϕ 2 ) \underset{\sim}{a} = \begin{pmatrix}
\frac{\phi}{1+\phi^2}\\
\frac{\phi}{1+\phi^2}
\end{pmatrix} ∼ a = ( 1 + ϕ 2 ϕ 1 + ϕ 2 ϕ )
P ( X 2 ∣ X 1 , X 3 ) = ϕ 1 + ϕ 2 X 1 + ϕ 1 + ϕ 2 X 3 = ϕ 1 + ϕ 2 ( X 1 + X 3 ) + ϕ 1 + ϕ 2 ⋅ 0 P(X_2|X_1, X_3) = \frac{\phi}{1+\phi^2}X_1 + \frac{\phi}{1+\phi^2}X_3 = \frac{\phi}{1+\phi^2}(X_1+X_3) + \frac{\phi}{1+\phi^2}\cdot 0 P ( X 2 ∣ X 1 , X 3 ) = 1 + ϕ 2 ϕ X 1 + 1 + ϕ 2 ϕ X 3 = 1 + ϕ 2 ϕ ( X 1 + X 3 ) + 1 + ϕ 2 ϕ ⋅ 0
P ( ⋅ ∣ W ) P(\cdot|W) P ( ⋅ ∣ W ) is a prediction operator and it has useful properties:
Let E ( U 2 ) < ∞ , E ( V 2 ) < ∞ , Γ = C o v ( W ∼ , W ∼ ) E(U^2) < \infty, E(V^2) < \infty, \Gamma=Cov(\underset{\sim}{W}, \underset{\sim}{W}) E ( U 2 ) < ∞ , E ( V 2 ) < ∞ , Γ = C o v ( ∼ W , ∼ W )
Let β , a 1 , ⋯ , a n \beta, a_1, \cdots, a_n β , a 1 , ⋯ , a n be constants
P ( U ∣ W ) = E ( U ) + a T ( W − E ( W ∼ ) ) P(U|W) = E(U) + a^T(W - E(\underset{\sim}{W})) P ( U ∣ W ) = E ( U ) + a T ( W − E ( ∼ W ) ) where \Gamma_\underset{\sim}{a} = Cov(U, \underset{\sim}{W})
E ( ( U − P ( U ∣ W ∼ ) ) W ∼ ) = 0 ∼ E((U-P(U|\underset{\sim}{W}))\underset{\sim}{W}) = \underset{\sim}{0} E ( ( U − P ( U ∣ ∼ W ) ) ∼ W ) = ∼ 0 and E ( U − P ( U ∣ W ∼ ) ) = 0 E(U-P(U|\underset{\sim}{W}))=0 E ( U − P ( U ∣ ∼ W ) ) = 0
E ( ( U − P ( U ∣ W ∼ ) ) 2 ) = V a r ( U ) − a ∼ T C o v ( U , W ∼ ) E((U-P(U|\underset{\sim}{W}))^2) = Var(U) - \underset{\sim}{a}^TCov(U, \underset{\sim}{W}) E ( ( U − P ( U ∣ ∼ W ) ) 2 ) = V a r ( U ) − ∼ a T C o v ( U , ∼ W )
P ( a i U + a 2 V + β ∣ W ) = a 1 P ( U ∣ W ) + a 2 P ( V ∣ W ) + β P(a_iU+a_2V+\beta|W) = a_1P(U|W) + a_2P(V|W)+\beta P ( a i U + a 2 V + β ∣ W ) = a 1 P ( U ∣ W ) + a 2 P ( V ∣ W ) + β
P ( ∑ i = 1 n a i w i + β ∣ W ∼ ) = ∑ i = 1 n a i w i + β P(\sum_{i=1}^na_iw_i + \beta|\underset{\sim}{W}) = \sum_{i=1}^na_iw_i+\beta P ( ∑ i = 1 n a i w i + β ∣ ∼ W ) = ∑ i = 1 n a i w i + β
P ( U ∣ W ∼ ) = E ( U ) P(U|\underset{\sim}{W}) = E(U) P ( U ∣ ∼ W ) = E ( U ) if C o v ( U , W ∼ ) = 0 Cov(U,\underset{\sim}{W})=0 C o v ( U , ∼ W ) = 0
P ( U ∣ W ∼ ) = P ( P ( U ∣ W ∼ , V ∼ ) ∣ W ∼ ) P(U|\underset{\sim}{W})=P(P(U|\underset{\sim}{W},\underset{\sim}{V})|\underset{\sim}{W}) P ( U ∣ ∼ W ) = P ( P ( U ∣ ∼ W , ∼ V ) ∣ ∼ W )
(Note: for property 7, P ( U ∣ W ∼ , V ∼ ) = μ U + a W ( W − μ W ) + a V ( V − μ V ) P(U|\underset{\sim}{W}, \underset{\sim}{V})=\mu_U+a_W(W-\mu_W)+a_V(V-\mu_V) P ( U ∣ ∼ W , ∼ V ) = μ U + a W ( W − μ W ) + a V ( V − μ V ) , P ( U ∣ W ∼ ) = P ( μ U + a W ( W − μ W ) + a V ( V − μ V ) ∣ W ) P(U|\underset{\sim}{W}) = P(\mu_U + a_W(W-\mu_W)+a_V(V-\mu_V)|W) P ( U ∣ ∼ W ) = P ( μ U + a W ( W − μ W ) + a V ( V − μ V ) ∣ W ) )
Assume { X t } \{X_t\} { X t } is a stationary process with mean 0 and auto-covariance function γ ( ⋅ ) \gamma(\cdot) γ ( ⋅ ) , we can solve for a ∼ \underset{\sim}{a} ∼ a to determine P n X n + h P_nX_{n+h} P n X n + h in terms of { X n , ⋯ , X 1 } \{X_n, \cdots, X_1\} { X n , ⋯ , X 1 } .
However, for large n n n , inventory Γ \Gamma Γ is not fun!
Perhaps we can use linearity of P n P_n P n to do recursive prediction of P n + 1 X n + h P_{n+1}X_{n+h} P n + 1 X n + h from P n X n + 1 P_nX_{n+1} P n X n + 1 .
If Γ n \Gamma_n Γ n id non-singular, then P n X n + 1 = ϕ n T ∼ X ∼ = ϕ 1 X n + ⋯ + ϕ n X 1 P_nX_{n+1} = \underset{\sim}{\phi^T_n}\underset{\sim}{X} = \phi_1X_n + \cdots + \phi_nX_1 P n X n + 1 = ∼ ϕ n T ∼ X = ϕ 1 X n + ⋯ + ϕ n X 1